Work request processor

ABSTRACT

A network processor includes a schedule, sync and order (SSO) module for scheduling and assigning work to multiple processors. The SSO includes an on-deck unit (ODU) that provides a table having several entries, each entry storing a respective work queue entry, and a number of lists. Each of the lists may be associated with a respective processor configured to execute the work, and includes pointers to entries in the table. A pointer is added to the list based on an indication of whether the associated processor accepts the WQE corresponding to the pointer.

RELATED APPLICATION

This application is a continuation of U.S. application Ser. No.13/285,773, filed Oct. 31, 2011. The entire teachings of the aboveapplication are incorporated herein by reference.

BACKGROUND

Typical network processors schedule and queue work such as packetprocessing operations for upper level network protocols, and allowprocessing with respect to upper level network protocols (e.g.,transport and application layers) in received packets before forwardingthe packets to connected devices. The functions typically performed bynetwork processors include packet filtering, queue management andpriority, quality of service enforcement, and access control. Byemploying features specific to processing packet data, networkprocessors can optimize an interface of a networked device.

SUMMARY

Embodiments of the present invention provide a system for processingwork requests in a network. An add work engine (AWE) forwards a workqueue entry (WQE) to one of a plurality of input queues (IQs). Anon-deck unit (ODU) provides a table having several entries, each entrystoring a respective WQE, and a number of lists. Each of the lists maybe associated with a respective processor configured to execute WQEs,and includes pointers to entries in the table. A pointer is added to thelist based on an indication of whether the associated processor acceptsthe WQE corresponding to the pointer. A get work engine (GWE) moves WQEsfrom the plurality of IQs to the table of the ODU.

In further embodiments, the indication of acceptance is received fromthe associated processor itself, and can be based on a number offactors, such as a work group corresponding to the WQE, a comparison ofa priority of the WQE against a priority of other WQEs stored at thelist, or an identifier of the IQ storing the WQE.

In still further embodiments, the system can include a number of workslots. Each work slot can be associated with a respective processor andconfigured to receive a WQE from the list associated with the processor.The respective processor may execute the WQE at the work slot. The listscan include pointers to a common WQE in the table. Further, each of thelists can include pointers to a common WQE at the table, and each of thelists may be updated by removing a pointer when the associated WQE ismoved to a work slot of a processor not associated with the list.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing will be apparent from the following more particulardescription of example embodiments of the invention, as illustrated inthe accompanying drawings in which like reference characters refer tothe same parts throughout the different views. The drawings are notnecessarily to scale, emphasis instead being placed upon illustratingembodiments of the present invention.

FIG. 1 is a block diagram illustrating a network services processor inwhich embodiments of the present invention may be implemented.

FIG. 2 is a block diagram of a system for scheduling and assigning workin one embodiment.

FIG. 3 is a block diagram of a top-level view of an on-deck unit (ODU)in one embodiment.

FIGS. 4A-B are block diagrams illustrating a portion of the ODU formaintaining a work list in one embodiment.

FIG. 5 is a block diagram illustrating the components of a work entry inone embodiment.

DETAILED DESCRIPTION OF THE INVENTION

Before describing example embodiments of the present invention indetail, an example network security processor in which the embodimentsmay be implemented is described immediately below to help the readerunderstand the inventive features of the present invention.

FIG. 1 is a block diagram illustrating a network services processor 100.The network services processor 100 delivers high application performanceusing at least one processor core 120.

The network services processor 100 processes Open System Interconnectionnetwork L2-L7 layer protocols encapsulated in received packets. As iswell-known to those skilled in the art, the Open System Interconnection(OSI) reference model defines seven network protocol layers (L1-L7). Thephysical layer (L1) represents the actual interface, electrical andphysical that connects a device to a transmission medium. The data linklayer (L2) performs data framing. The network layer (L3) formats thedata into packets. The transport layer (L4) handles end to endtransport. The session layer (L5) manages communications betweendevices, for example, whether communication is half-duplex orfull-duplex. The presentation layer (L6) manages data formatting andpresentation, for example, syntax, control codes, special graphics andcharacter sets. The application layer (L7) permits communication betweenusers, for example, file transfer and electronic mail.

The network services processor 100 may schedule and queue work (packetprocessing operations) for upper level network protocols, for exampleL4-L7, and allow processing of upper level network protocols in receivedpackets to be performed to forward packets at wire-speed. Wire-speed isthe rate of data transfer of the network over which data is transmittedand received. By processing the protocols to forward the packets atwire-speed, the network services processor does not slow down thenetwork data transfer rate.

A packet is received for processing by a plurality of interface units122. A packet can also be received by a PCI interface 124. The interfaceunit 122 performs pre-processing of the received packet by checkingvarious fields in the L2 network protocol header included in thereceived packet and then forwards the packet to a packet input unit 126.At least one interface unit 122 a can receive packets from a pluralityof X Attachment Unit Interfaces (XAUI), Reduced X Attachment UnitInterfaces (RXAUI), or Serial Gigabit Media Independent Interfaces(SGMII). At least one interface unit 122 b can receive connections froman Interlaken Interface (ILK).

The packet input unit 126 performs further pre-processing of networkprotocol headers (e.g., L3 and L4 headers) included in the receivedpacket. The pre-processing includes checksum checks for TCP/UserDatagram Protocol (UDP) (L3 network protocols).

A free-pool allocator 128 maintains pools of pointers to free memory inLevel-2 cache memory 130 and external DRAM 108. The packet input unit126 uses one of the pools of pointers to store received packet data inLevel-2 cache memory 130 or external DRAM 108 and another of the poolsof pointers to allocate work queue entries for the processor cores 120.

The packet input unit 126 then writes packet data into buffers inLevel-2 cache 130 or external DRAM 108. Preferably, the packet data iswritten into the buffers in a format convenient to higher-layer softwareexecuted in at least one of the processor cores 120. Thus, furtherprocessing of higher level network protocols is facilitated.

The network services processor 100 can also include one or moreapplication specific co-processors. These co-processors, when included,offload some of the processing from the cores 120, thereby enabling thenetwork services processor to achieve high-throughput packet processing.For example, a compression/decompression co-processor 132 is providedthat is dedicated to performing compression and decompression ofreceived packets. Other embodiments of co-processing units include theRAID/De-Dup Unit 162, which accelerates data striping and dataduplication processing for disk-storage applications.

Another co-processor is a Hyper Finite Automata (HFA) unit 160 whichincludes dedicated HFA thread engines adapted to accelerate patternand/or signature matching necessary for anti-virus, intrusion-detectionsystems and other content-processing applications. Using a HFA unit 160,pattern and/or signature matching is accelerated, for example beingperformed at rates upwards of multiples of tens of gigabits per second.The HFA unit 160, in some embodiments, could include any of aDeterministic Finite Automata (DFA), Non-deterministic Finite Automata(NFA), or HFA algorithm unit.

An I/O interface 136 manages the overall protocol and arbitration andprovides coherent I/O partitioning. The I/O interface 136 includes anI/O bridge 138 and a fetch-and-add unit 140. The I/O Bridge includes twobridges, an I/O Packet Bridge (IOBP) 138 a and an I/O Bus Bridge (IOBN)138 b. The I/O Packet Bridge 138 a is configured to manage the overallprotocol and arbitration and provide coherent I/O portioning withprimarily packet input and output. The I/O Bus Bridge 138 b isconfigured to manage the overall protocol and arbitration and providecoherent I/O portioning with primarily the I/O Bus. Registers in thefetch-and-add unit 140 are used to maintain lengths of the output queuesthat are used for forwarding processed packets through a packet outputunit 146. The I/O bridge 138 includes buffer queues for storinginformation to be transferred between a coherent memory interconnect(CMI) 144, an I/O bus 142, the packet input unit 126, and the packetoutput unit 146.

The miscellaneous I/O interface (MIO) 116 can include auxiliaryinterfaces such as General Purpose I/O (GPIO), Flash, IEEE 802 two-wireManagement Data I/O Interface (MDIO), Serial Management Interface (SMI),Universal Asynchronous Receiver-Transmitters (UARTs), Reduced GigabitMedia Independent Interface (RGMII), Media Independent Interface (MII),two wire serial interface (TWSI) and other serial interfaces.

The network services provider 100 may also include a Joint Test ActionGroup (“JTAG”) Interface 123 supporting the MIPS EJTAG standard.According to the JTAG and MIPS EJTAG standards, a plurality of coreswithin the network services provider 100 will each have an internal TestAccess Port (“TAP”) controller. This allows multi-core debug support ofthe network services provider 100.

A Schedule/Sync and Order (SSO) module 148 queues and schedules work forthe processor cores 120. Work is queued by adding a work queue entry toa queue. For example, a work queue entry is added by the packet inputunit 126 for each packet arrival. A timer unit 150 is used to schedulework for the processor cores 120.

Processor cores 120 request work from the SSO module 148. The SSO module148 selects (i.e., schedules) work for one of the processor cores 120and returns a pointer to the work queue entry describing the work to theprocessor core 120.

The processor core 120, in turn, includes instruction cache 152, Level-1data cache 154, and crypto-acceleration 156. In one embodiment, thenetwork services processor 100 includes 32 superscalar ReducedInstruction Set Computer (RISC)-type processor cores 120. In someembodiments, each of the superscalar RISC-type processor cores 120includes an extension of the MIPS64 version 3 processor core. In oneembodiment, each of the superscalar RISC-type processor cores 120includes a cnMIPS II processor core.

Level-2 cache memory 130 and external DRAM 108 are shared by all of theprocessor cores 120 and I/O co-processor devices. Each processor core120 is coupled to the Level-2 cache memory 130 by the CMI 144. The CMI144 is a communication channel for all memory and I/O transactionsbetween the processor cores 120, the I/O interface 136 and the Level-2cache memory 130 and controller. In one embodiment, the CMI 144 isscalable to 32 processor cores 120, supporting fully-coherent Level-1data caches 154 with write through. Preferably the CMI 144 ishighly-buffered with the ability to prioritize I/O. The CMI is coupledto a trace control unit 164 configured capture bus request so softwarecan later read the request and generate a trace of the sequence ofevents on the CMI.

The Level-2 cache memory controller 131 maintains memory referencecoherence. It returns the latest copy of a block for every fill request,whether the block is stored in Level-2 cache memory 130, in externalDRAM 108, or is “in-flight.” It also stores a duplicate copy of the tagsfor the data cache 154 in each processor core 120. It compares theaddresses of cache-block-store requests against the data-cache tags, andinvalidates (both copies) a data-cache tag for a processor core 120whenever a store instruction is from another processor core or from anI/O component via the I/O interface 136.

In some embodiments, a plurality of DRAM controllers 133 supports up to128 gigabytes of DRAM. In one embodiment, the plurality of DRAMcontrollers includes four DRAM controllers, each of the DRAM controllerssupporting 32 gigabytes of DRAM. Preferably, each DRAM controller 133supports a 64-bit interface to DRAM 108. Additionally, the DRAMcontroller 133 can supports preferred protocols, such as the DDR-IIIprotocol.

After a packet has been processed by the processor cores 120, the packetoutput unit 146 reads the packet data from the Level-2 cache memory 130,DRAM 108, performs L4 network protocol post-processing (e.g., generatesa TCP/UDP checksum), forwards the packet through the interface units 122or the PCI interface 124 and frees the L2 cache memory 130/DRAM 108 usedby the packet.

The DRAM Controllers 133 manages in-flight transactions (loads/stores)to/from the DRAM 108. In some embodiments, the DRAM Controllers 133include four DRAM controllers, the DRAM 108 includes four DRAM memories,and each DRAM controller is connected to a DRAM memory. The DFA unit 160is coupled directly to the DRAM Controllers 133 on a bypass-cache accesspath 135. The bypass-cache access path 135 allows the HFA Unit to readdirectly from the memory without using the Level-2 cache memory 130,which can improve efficiency for HFA operations.

Embodiments of the present invention may be implemented in the networkservices processor 100 shown in FIG. 1, and may be directed moreparticularly to the schedule, sync and order (SSO) module 148 andassociated components. Example embodiments are described in furtherdetail below with reference to FIGS. 2-5.

FIG. 2 is a block diagram of a schedule, sync and order (SSO) module 200in one embodiment. The SSO module 200 may be implemented in the networkprocessor 100 (as SSO module 200) described above with reference to FIG.1, and operates to queue and schedule work for multiple processor cores270A-N. The SSO module 200 includes an add work engine (AWE) 210, whichreceives external requests to process packet data and creates work queueentries (WQEs) based on those requests. A number of input queues (IQ)215A-N store the WQEs. A get work engine (GWE) 230 retrieves the WQEsfrom the IQs 215A-N, and an on deck unit (ODU) 240 schedules the workfor assignment at the processor cores (“processors”) 270A-N. A number ofwork slots (WS) 260A-N each correspond to a particular processor 270A-N,and hold a WQE for completion by the processor 270A-N. Additionalcomponents not shown, such as memory arrays and bus interfaces, may beimplemented to support the operation of the above components.

Typical scheduling circuits for multiple processors have employed a workqueue that is shared by the multiple processors, where each of theprocessors includes a work slot having a pointer to an entry in the workqueue. In the case that several (or all) processors can execute the samework, a single task may be scheduled for several processors. As aresult, only the first processor receives the work, the scheduling atthe additional processors is invalidated, and those additionalprocessors must wait for additional jobs to be added to the work queue.

Embodiments of the present invention provide for scheduling work formultiple processors at a high frequency while maximizing the workcapacity of those processors. In an example process for scheduling andassigning work at the SSO 200, the AWE 210 adds a WQE to one of several(e.g., eight) IQs 215A-N. The GWE 230 traverses the IQs 215A-N andplaces pointers to executable WQEs into the ODU 240. When a processor270A-N requests new work, the WS 260A-N associated with the processor270A-N delivers the WQE referenced by the ODU's 240 pointer.

The ODU 240 may include a central multi-entry (e.g., 32-entry) tablestoring WQE information, and multiple (e.g., 32) lists each associatedwith a respective processor 260A-N, where each list can hold pointers to4 multiple (e.g., 4) entries in the central table. The 4 pointer storageelements per list are also referred to as “slots.” The pointers in eachlist may be sorted by priority, which is derived from the respective IQ215A-N and which can be different across the 32 lists.

Operation of the ODU 240 in an example embodiment, in particular withrespect to the scheduling of entries and the control and maintenance ofthe table and lists, is described below with reference to FIGS. 3-5.

FIG. 3 is a block diagram of a top-level view of an on-deck unit (ODU)in one embodiment. With reference to FIG. 2, in this example embodiment,there are 32 Lists (“List0”-“List31”), one associated with eachprocessor 270A-N and its respective WS 260A-N. The entry table includes32 entries (“Entry0”-“Entry31”). The GWE 230 provides an indication ofincoming WQEs to be provided to the ODU. In response to this indication,ODU control logic for each list determines whether to request to load anincoming WQE to that list. The entries compare a identifier (“idx”) ofthe incoming WQE, which uniquely identifies the WQE against the valididentifiers in the table. If there is a match, and the entry is notpending, the entry number to be loaded into the lists will be the entrynumber where the memory value matches. If the entry is pending, no listis allowed to load the incoming WQE because the WS is already processingthat WQE. If there is no match, the ODU looks for a free entry to use byfinding the first entry that is both not-valid and corresponds to thenumber of a list that wants to load the incoming WQE. This behaviorguarantees that a list that wants only uncommon WQEs will not be blockedby other, more requested WQEs occupying all the entries. If no entrysatisfies the search criteria, the WQE is not loaded into the entrytable nor any of the lists. If the WQE is to be loaded, the top-levellogic sends the entry number to be used to all the lists, and also tothe entry table if the WQE is not already loaded into an entry. If theWQE is not to be loaded, the top-level control logic sends controlsignals to the all the lists indicating not to load the WQE.

When information about a WQE is presented at the end of the GWE 230pipeline, all 32 lists in the ODU 240 determine whether to use one oftheir slots for this WQE. The determination is based on a) whether theprocessor 270A-N associated with the list accepts work from a group theWQE belongs to (based on a group tag associated with the WQE), b)whether the priority of this WQE (based on a priority tag) is betterthan the worst-priority WQE stored in the lists' slots, and c) whetherthe list can accept work from the IQ 215A-N holding the WQE. For thepurposes of (b), any slots in the list not pointing to valid WQEs may betreated as having a priority 1 degree lower than the worst permittedpriority.

The ODU 240 examines the vector of lists that request using a slot forthe incoming WQE and matches that vector against the state of the table.The ODU 240 assigns an entry number for the incoming WQE that is thefirst match of the intersection of lists that request the WQE andentries that are empty. The entry in the central table then loads theWQE information, and marks the state of the entry VALID, while the liststhat want the WQE store the assigned entry number, and mark the state ofthe slot storing the entry number VALID. Selecting the new entry fromthe AND of the lists and available entries ensures that a list that onlyinfrequently sees qualifying WQEs will have an entry available for thoserare WQEs.

If the ODU 240 is to load a new entry to a list, but the incoming WQEwould not be one of the top-4 WQEs for the associated processor 270A-Nin terms of priority, the list does not request that the WQE be loaded,but instead may configure a bit in a register or other identifier, theidentifier corresponding to the IQ 215A-N from which the WQE was read.As a result of this configuration, the list may not request loading anyother work from that IQ 215A-N until that IQ 215A-N has undergone arestart, thereby maintaining ordering of WQEs in the event that space onthe list becomes available after the initial rejection of a WQE.

The valid state of each list, and the entry number at the front of eachlist are output to the corresponding WS 260A-N. If the priority of theWQE pointed to by the first slot in a list is of the highest priorityavailable for the associated processor 270A-N, the ODU 240 notifies theWS 260A-N that the list is both VALID and READY.

When the WS 260A-N receives a GET_WORK request from the associatedprocessor 270A-N, and the request is to be satisfied from the IQs215A-N, the WS 260A-N notifies the ODU 240 with the entry number of theaccepted work. In the central table of the ODU 240, the state of theentry is changed from VALID to PENDING. At this time, the table alsotransmits to the WS 260A-N most of the information associated with theWQE 260A-N (i.e., group number, tag, address in memory). In the lists,the accepted entry number is invalidated (i.e., the state changes fromVALID to EMPTY), and all other entries on lists “behind” the acceptedentry move forward one slot toward the head of their lists. The processmay be limited such that only one entry is accepted per cycle.

When the WS 260A-N has completed the acceptance process, and extractedthe WQE from the IQs 215A-N, it again notifies the ODU 240, and thecentral table changes the state of the entry from PENDING to EMPTY. Noinvalidates are required in the lists at extract time, because thenumber of the extracted entry had already been removed from any list inwhich it had been present.

Entries in the table can also be invalidated for other reasons: a) theWS 260A-N could switch onto an atomic tag that matches the atomic tag ofa WQE (or WQEs) in the table, b) the switch onto the atomic tag is dueto a WQE being accepted from the table matching the atomic tag ofanother WQE (or WQEs) in the table, c) bumping (described furtherbelow), d) a flush of the entire contents of the table (other thanentries in the PENDING state) due to a change in the group-mask orpriority programming, and e) garbage collection (described furtherbelow). Any time an entry is invalidated, any slots corresponding to theinvalidated entry numbers may be freed, and all other entries on lists“behind” the invalidated entries move forward toward the heads of theirlists. Notification may also be sent to the WS 260A-N so that the WS260A-N can stop processing of any GET_WORK request using that entry.

The central entry table may include comparators on the WQE tag values ineach entry to allow for parallel computation of cases (a) and (b) above.The central entry table may also include comparators on the WQE 260A-Nthat effectively receive copies of commands sent to a local memory,allowing the entry table values for pointer tuples to remain up to datewith the IQs 215A-N as represented in the memory.

A WQE that is requested by one or more processors 270A-N, and that hadbeen loaded into the central entry table, can be pushed out of all thelists it was on due to those lists loading better-priority WQEs in frontof it. Once an entry is not on any lists, the entry must be“garbage-collected.” Accordingly, the ODU 240 may detect cases where anentry is valid but the entry number is not on any list, and invalidatesthe entry.

The GWE 230 may present a high-priority WQE that should be placed at thefront of 1 or more lists. To accommodate this, a WQE already in thetable must be “bumped,” and replaced by the new WQE. There are severalalgorithms that can be implemented to select the entry to be bumped suchas a) the lowest-weighted entry, where weight is a function of how manylists an entry appears on and how close to the front the entry is oneach list, or b) the entry that appears furthest from the front of anylist, regardless of how many lists the entry may appear on. The entrybeing bumped is invalidated from any list in which is appears.Simultaneously, the new WQE is loaded into the central entry table,while any list that wanted the new WQE loads the bumped entry numberinto the proper slot, adjusting other slots as appropriate. The samelist may invalidate and reload the entry number in the same cycle.

The top-level control logic may also be responsible for informing theGWE when it needs to restart due to a WQE being accepted, bumping,ejecting, or garbage collecting.

FIGS. 4A-B are block diagrams illustrating a portion of the ODU formaintaining a work list in one embodiment. With reference to FIG. 2,while the GWE 230 is processing a WQE for providing to the ODU 240, itsends a preview of the WQE's group number (grp) and IQ number (iq) tothe all the ODU lists in parallel. Each ODU list checks whether the grpis allowed for the processor 270A-N associated with the list byselecting one bit from the associated tag. In parallel, each ODU listdetermines the priority of the incoming WQE by using the iq value toselect a 4-bit priority value from the associated tag. Also in parallel,each ODU list determines if it has marked the IQ 215A-N as requiring arestart.

If the incoming WQE's grp is allowed for this list, and the priority ishigher than the priority of an entry presently in slot 3 (the lowestpriority slot of the list), and the IQ 215A-N does not need a restart,then the list sends a request indication to the ODU control logic, andindicates which slot it believes the incoming WQE should occupy on thelist. If the grp is allowed, but the priority is lower or the same asthe entry in slot 3, then the list sends a corresponding indication tothe ODU top-level logic, and marks the IQ 215A-N of the “missed” WQE asneeding to restart before this list can accept any new WQEs from that IQ215A-N. The priority comparison is facilitated by using the valid bit ofeach slot, negated, as the most-significant bit of the priority value,meaning that any slot without a valid entry will automatically beprocessed at a lower priority than any non-reserved priority value.

A slot of a list can be invalidated through several means. For example,“accepts” sent from a respective WS 260A-N invalidate any slot where theslot's entry number matches the accepted entry number. Bumps, asdescribed above, may invalidate all slots following the slot where thenew entry has been added. Further, the GWE 230 can indicate that the WQEadded on the prior cycle should not have been added, and thus if it wasadded, the slot that just added the WQE will be invalidated. Inaddition, an invalidate vector from the entry table can indicateanywhere from 0 to 32 entries that should be invalidated for any of thereasons described above with reference to FIG. 2.

Each list may determine which (if any, could be 0 to 4) of its slotsshould be invalidated each cycle. Based on information from the ODUtop-level control, each list determines which of its slots should add anew WQE. The list uses “invalidate” and “insert” control signals toinsert the incoming WQE's entry number into the proper slot, and toadjust the other slots appropriately.

FIG. 5 is a block diagram illustrating the components of a work entrystored at an ODU table in one embodiment. Each of the 32 entries of thetable may store the following information for a WQ:

-   -   a) The index (idx), previous (prev), and next (next) pointers        (each 11 bits) describing the location of the WQE in the IQ's        linked list structure. Sometimes these 3 values together are        referred to as a pointer tuple. Associated with the prev pointer        is a “head” bit; if set, it indicates the idx is at the head of        some IQ and prev value is not meaningful. Associated with the        next pointer is a “tail” bit; if set, it indicates the idx is        (or was as some point) at the tail of some IQ and the next value        is not meaningful.    -   b) The group number (grp) of the WQE—6 bits.    -   c) The IQ number (iq) of the WQE—3 bits.    -   d) The tag (tag, 32 bits) and tag-type (ttype, 2 bits) of the        WQE.    -   e) The WQE pointer (wqp) of the WQE—35 bits.    -   f) A valid bit and a pending bit.

The GWE 230 sends the above information to the ODU 240. Within eachentry are a number of comparators (“CAMs”) used to determine thefollowing:

-   -   a) whether an incoming WQE has already been stored in a valid        entry (idx vs. incoming idx),    -   b) whether an incoming WQE is allowed to bump a valid entry (iq        vs. incoming iq),    -   c) when prev and next pointers should be updated due to changes        in the IQ linked lists,    -   d) whether a valid WQE should be invalidated due to a WS        switching onto an

ATOMIC tag, and

-   -   e) whether a valid WQE should be invalidated due to a WS        accepting another entry using the same ATOMIC tag.

In addition to the conditions of (d) and (e) above, entries can also beinvalided when the full ODU is flushed due a configuration change, orgarbage collection when no list holds the entry number any longer. Whenan Entry is invalidated, or multiple Entries are simultaneouslyinvalidated, all the ODU lists and the WS are informed with anindication signal.

While this invention has been particularly shown and described withreferences to example embodiments thereof, it will be understood bythose skilled in the art that various changes in form and details may bemade therein without departing from the scope of the inventionencompassed by the appended claims.

What is claimed is:
 1. A scheduling processor for scheduling work for aplurality of processors, the scheduling processor comprising: an addwork engine (AWE) configured to forward a work queue entry (WQE) to oneof a plurality of input queues (IQs); an on-deck unit (ODU) comprising atable having a plurality of entries, each entry storing a respectiveWQE; and a plurality of lists, each of the lists being associated with arespective processor configured to execute WQEs and comprising aplurality of pointers to entries in the table, each of the lists addinga pointer based on an indication of whether the associated processoraccepts the WQE corresponding to the pointer; and a get work engine(GWE) configured to move WQEs from the plurality of IQs to the table ofthe ODU.
 2. The scheduling processor of claim 1, wherein the indicationis received from the associated processor itself.
 3. The schedulingprocessor of claim 1, wherein the indication is based on one or more of:a work group corresponding to the WQE, a comparison of a priority of theWQE against a priority of other WQEs stored at the list, and anidentifier of the IQ storing the WQE.
 4. The scheduling processor ofclaim 1, further comprising a plurality of work slots, each of the workslots being associated with a respective processor and configured toreceive a WQE from the list associated with the processor.
 5. Thescheduling processor of claim 4, wherein the respective processorexecutes the WQE at the work slot.
 6. The scheduling processor of claim4, wherein each of the lists includes pointers to a common WQE in thetable.
 7. The scheduling processor of claim 6, wherein each of the listsis updated by removing a pointer when the associated WQE is moved to awork slot of a processor not associated with the list.
 8. A method ofprocessing work requests in a network, comprising: forwarding a workqueue entry (WQE) to one of a plurality of input queues (IQs); movingthe WQE from the IQ to one of a plurality of entries at a table;configuring a plurality of lists to store a plurality of pointers toentries in the table, each of the lists being associated with arespective processor configured to execute WQEs and; and adding apointer to one of the plurality of lists based on an indication ofwhether the associated processor accepts the WQE corresponding to thepointer.
 9. The method of claim 8, wherein the indication is receivedfrom the associated processor itself.
 10. The method of claim 8, whereinthe indication is based on one or more of: a work group corresponding tothe WQE, a comparison of a priority of the WQE against a priority ofother WQEs stored at the list, and an identifier of the IQ storing theWQE.
 11. The method of claim 8, further comprising receiving a WQE fromthe list to one of a plurality of work slots, each of the work slotsbeing associated with a respective processor.
 12. The method of claim11, further comprising executing, at the respective processor, the WQEat the work slot.
 13. The method of claim 11, wherein each of the listsincludes pointers to a common WQE in the table.
 14. The method of claim13, further comprising updating each of the lists by removing a pointerwhen the associated WQE is moved to a work slot of a processor notassociated with the list.